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Abstract. A practical method to compute the Riemann zeta function is 
presented. The method can compute (,(1/2 + it) at any |T 1//4 J points in 
[T, T + T 1 / 4 ] using an average time of T 1 / 4 +°( 1 ) per point. This is the same 
complexity as the Odlyzko-Schonhage algorithm over that interval. Although 
the method far from competes with the Odlyzko-Schonhage algorithm over 
intervals much longer than T 1 / 4 , it still has the advantages of being elemen- 
tary, simple to implement, it does not use the fast Fourier transform or require 
large amounts of storage space, and its error terms are easy to control. The 
method has been implemented, and results of timing experiments agree with 
its theoretical amortized complexity of T 1 / 4 +°( 1 ) . 



1. Introduction 

The Riemann zeta function £(s) can be calculated using the Riemann-Siegel 
formula. A frequently stated version of that formula on the critical line is: let 

(1.1) Z(t) = e"<*>C(l/2 + it) , e"W = ( ^ + 1^ ) ^ . 

The rotation factor e* e W is chosen so that Z(t) is real. Let a := y/t/(2ir), n\ := [a\ 
be the integer part of a, and p := {a} = a — [a\ be the fractional part of a. Then 
for t > 27r, 



(1.2) 



Z(t) = » ( 2e-^ 2 — J + ^I7^ $ W 



a 

{— l)™i+ 2 
967r 2 a 3 / 2 



$( 3 )(p) + ^W 



where 

;= cob Mx'-^- 1/16) 
cos 27ra; 

and ^ 3 \x) is the third derivative of $(x) with respect to x. Gabcke |Gaj showed 
(1.4) \R(t)\ < .053i" 5/4 , for t > 200 , 
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which is sufficient for most applications. Odlyzko and Schonhage }OSj showed 
how to compute the rotation factor e~ l6tyt \ and the correction terms $(x) and 
$ (3) (a;), to within ±^ K " 10 , with n > fixed, using t *^ operations on numbers 
of K (\ogt) bits. Note throughout, asymptotic constants are taken as t — > oo , and 
the notations O k (.) and o re (.) mean asymptotic constants depend on the parameter 
k only. Therefore, to calculate the rotated zeta function Z(t), the bulk of the 
computational effort is in computing the sum 

711 At log n 

(1.5) F{t ):=J2—^, n x =LVV(^)J- 

By taking more correction terms in formula (|1.2p . one can arrange for the re- 
mainder term R(t) to be bounded by 0(t~ K ) for any fixed n > 0. Odlyzko and 
Schonhage [OSj showed the additional correction terms can also be computed to 
within ±t~ K in t° K ^ operations on numbers of K (logt) bits. 

Frequently, one is interested in numerically evaluating Z(t) at N points in an 
interval of the form t G [T,T + T a ], where a € [0, 1/2] say, and N is large. This 
is the case, for example, when one attempts to locate real zeros of Z(t) to within 
±T~ K , or study moments of the zeta function. A straightforward application of the 
Riemann-Siegel formula can do so in 7VT 1 / 2+ ° K ( 1 * ) operations. 

The purpose of this note is to improve the running time in the iV-aspect. So, 
although the proposed method still consumes iVs+M 1 ) time to evaluate Z(t) to 
within ±t~ K at a single point, it achieves substantially lower running times if one 
is interested in computing zeta at many points. This type of idea is not new: in 
the context of the Riemann zeta function, it dates back the algorithm of Odlyzko 
and Schonhage |OSj . 

The main feature of the proposed method is it completely avoids the two essen- 
tial components of the Odlyzko-Schonhage algorithm, which are the fast Fourier 
transform, and a sophisticated rational function evaluation algorithm. Instead, the 
method relies on a straightforward subdivision of the main sum in the Riemann- 
Siegel formula, a band-limited interpolation technique (see Appendix), and a direct 
evaluation to obtain the precomputation data. Therefore, its implementation is 
relatively straightforward, with friendly asymptotic constants, and its error terms 
are easy to bound. Also, it does not require large amounts of storage space for its 
precomputation data. The method far from competes with the Odlyzko-Schonhage 
algorithm in general. But in many situations, it achieves a similar complexity. 

The basic idea of the method is computing Z(t) for a lot of different, but neigh- 
boring, values of t involves many common steps. The method takes advantage of 
this to achieve lower running times. 

Before discussing the method any further, we make a few remarks. By computing 
zeta we mean to numerically evaluate Z(t) with "polynomial accuracy," that is, 
with an absolute error bounded by i _K , for any fixed k > 0. We measure the 
computational complexity of our method by the number of arithmetic operations 
required: additions, multiplications, divisions, complex exponential, and logarithm 
(involving numbers of K (logt) bits). That in turn can be routinely bounded by 
the number of bit operations. Lastly, as in the Odlyzko-Schonhage algorithm, the 
proposed method will generalize easily to any Dirichlet series: 
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(1.6) -j , s = cr + it with (j 6 [0, 1] say , 

n 

assuming the coefficients a n are known, or can be computed quickly. For definitc- 
ness, in the remainder of the paper, we specialize to the rotated zeta function on 
the critical line Z(t). Our main result is the following, 

Theorem 1.1. Given any fixed numbers a £ [0,1/2] and n > 0, there exists an 
algorithm that for every T > 100 will perform T 1 ^ 2+0k ^ operations on numbers 
of K (logT) bits using 7 1Q +°'=( 1 ) bits of storage, after which the algorithm will be 
capable of computing Z(t) at any t £ [T, T + T a ] to within ±T~ K in T a +°t( l ) 
operations. 

It is useful to compare the the algorithm of Theorem 11.11 with the Odlyzko- 
Schonhage algorithm (the statement below is equivalent to Theorem 5.1 in [OS] 
specialized to Z(tj): 

The Odlyzko-Schonhage algorithm 1.2. Given any a £ [0, 1/2], and any con- 
stants e and k, there is an effectively computable constant B — B(e,K,a), and an 
algorithm that for every T > will perform < BT 1 / 2+€ operations on numbers of 
< B log T bits using < BT a+t bits of storage and will then be capable of computing 
any value of Z(t) for T < t < T + T a to within ±T~ K in < BT e operations using 
the precomputed values. 

As mentioned earlier, the two central ingredients of the Odlyzko-Schonhage al- 
gorithm are the fast Fourier transform, and a rational function evaluation algo- 
rithm. Our method completely avoids these central ingredients. Instead, it relies 
on a straightforward subdivision of the main sum in the Riemann-Siegel formula, 
a band-limited interpolation technique (see Appendix), and a direct evaluation for 
the precomputation data. So, it is significantly simpler. 

We remark the Odlyzko-Schonhage algorithm has been implemented at least 
twice, by Odlyzko [Qj, and by Gourdon [Gj. Gourdon's implementation replaces 
the Odlyzko-Schonhage rational function algorithm by the Greengard-Rokhlin al- 
gorithm. This was suggested by Odlyzko as a possible improvement in [Oj . 

We carry out a comparison between the Odlyzko-Schonhage algorithm and the 
algorithm of Theorem 11.11 in the following context. Suppose we wish to evaluate 
Z(t) to within ±T~ K at about \T a \ points in the interval [T,T + T a ]. This is 
often the case in applications; for example, when one attempts to locate, to within 
±T~ K , the ordinates of non-trivial zeros of zeta on the critical line (notice these are 
the real zeros of Z(t)), one expects to require about T° K W evaluations of Z(t) per 
zero. Since there are T a +°( 1 ) zeros of Z(t) in the interval [T,T + T a ], then about 
ya+o K (i) eva i ua tions of Z(t) are needed in total. So, consider Table[TJ It compares 
the running times of the algorithm of Theorem 11.11 and the Odlyzko-Schonhage 
algorithm in such a situation. 

In Table [U the column "Single Eval." refers to the cost of a single evaluation of 
Z(t) after the precomputation has been carried out. The column "Average" refers 
to the average cost of evaluating Z(t) at all [T a \ points (in particular, the column 
"Average" takes the precomputation cost into account). To avoid notational clutter, 
the complexities listed in the table ignore little-o terms of the form T°M (In [OS] , 
it is stated the T £ term can be replaced by a fixed power of logT; nevertheless, it 
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Table 1. 



Algorithm 


Precomputation 


Storage 


Single Eval. 


Average 


Theorem 11.11 


T l/2 




rpty, 


rp\j1 — OL _|_ rpa 


Odlyzko-Schonhage 








rpl/2 — o: + e _|_ rptL 



is included in the table to be consistent with the formal statement of the Odlyzko- 
Schonhage algorithm). 

In view of the last column of the table, we see if a < 1/4, the algorithm of 
Theorem 11.11 achieves the same amortized complexity as the Odlyzko-Schonhage 
algorithm, but when a > 1/4 it does not. This suggests one should restrict a £ 
[0, 1/4]. More generally, we have the following corollary to Theorem ll.il 

Corollary 1.3. Given any fixed numbers a £ [0,1/2] and k > 0, there exists an 
algorithm that for every T > 100, will be capable of computing Z(t) to within ±T~ K 
at any \T a \ points in [T, T + T a ] using an average of 

{rpi/2~a+o K (i) operations per point ifO < a < 1/4, 
yi/4+o K (i) operations per point if 1/4 < a < 1/2, 

where the operations are performed on numbers of O k (log T) bits. The storage space 
requirement for the algorithm is 

( J ia +°«( 1 ) bits ifO < a < 1/4, 
| T i/4+o„(i) bits if 1/A< a <l/2. 

By comparison, to accomplish the same task as in the statement of the corollary, 
the Odlyzko-Schonhage algorithm requires j ll / 2 + e - a + ^«,a(i) operations per point 
on average, and it requires a storage space of 7 ia + £ +°«,".»( 1 ) bits (which is the same 
as corollary 11.31 when a £ [0, 1/4]). 

The statement of corollary 11.31 follows from a straightforward optimization pro- 
cedure. Specifically, for any fixed numbers 6 > 0, a £ [0, 1/2], and k > 0, such that 
S + a £ [0, 1/2], consider the successive intervals 

(1.7) {T + jT a ,T+(j + l)T a }, j = 0,l,...,\T 6 ]. 

Then, applying the algorithm of Theorem 11.11 a total of \T S ~\ + 1 times to these 
intervals, we deduce Z(t) can be computed to within ±T _K at any single point in 

(1.8) [T,T + T a+S ], a £ [0,1/2], 6>0, a + 8 £ [0, 1/2] , 

using 7 1q +°k( 1 ) operations, on numbers of K (logT) bits, provided a precompu- 
tation costing 7 ll / 2 + <5 +°«( 1 ) operations is performed. Notice the storage space re- 
quirement for the precomputation data can always be kept at T a+ ° K ^ bits. This 
is because the method will deal with one subinterval [T + jT a , T + (j + l)T a ] at 
a time, so the precomputation data for that subinterval can be discarded when the 
method is done there. 

Now, as in the statement of the corollary, let a £ [0, 1/2], and suppose we wish to 
compute Z{t) to within ±T" K at |T a J points in the interval [T, T+T a ]. Optimizing 
a and 6 to the above situation, we solve 
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(1.9) 



1/2 + 5 = a + a } 



and 



a + 5 = a , 



which has the solution a = 1/4 and 5 = a— 1/4. If a < 1/4, then 5 becomes negative. 
So in this case, we choose a = a and (5 = 0. For example, when a G [0, 1/4), the 
method requires 7 ll / 2 + l5 + ~( 1 ) = yi/2+o K (i) operations on numbers of K (log T) bits 
to compute Z(t) to within ±T~ K at all the \T a \ points in the interval [T, T + T a ]. 
This amounts to an average of 7 ll / 2 - a +°'<( 1 ) operations per point. The storage 
space requirements (for the precomputation data) is T a+0 "^ = 7 ia + »( 1 ) bits. The 
analysis of the case a <E (1/4, 1/2] is analogous. 

We compare the efficiency of our proposed method with Gourdon's implementa- 
tion [G] of the Odlyzko-Schonhage algorithm. To this end, consider that in order 
to locate zeros near t — 10 16 to within ±10 -8 say, one expects pa 8 evaluations of 
Z(t) are required per zero (see [O] p. 80, and [G] p. 26). The mean spacing of zeros 
near 10 16 is about 2ir/ log(10 16 /(27r)) pa 0.18. Therefore, one expects to evaluate 
Z(t) at points with 1/8 x 0.18 pa 0.02 mean spacing. 

In particular, to locate 2 x 10 9 consecutive zeros near t — 10 16 one expects to 
compute Z(t) at 8 x 2 x 10 9 = 1.6 x 10 10 successive points with mean spacing 0.02. 
Results of timing tests (see last entry in Tableland footnote 3 in Section 3), suggest 
our method will consume about 9 minutes to compute Z(t) at 10 5 such points. By 
extrapolation, we expect the method to consume (9/60) x (1.6 x 10 5 ) = 24,000 
hours to compute Z(t) at 1.6 x 10 10 such points. So, we extrapolate that the 
method will require 24,000 hours to locate 2 x 10 9 zeros of Z(t) near t = 10 16 . 
By comparison, Gourdon's implementation of the Odlyzko-Schonhage algorithm 
(see [G], p. 28) consumes 49.5 hours to locate the same number of zeros at that 
height. Thus, Gourdon's implementation is approximately 485 times faster than 
our method for that task. In turn, our method is approximately 437 times faster 
than Icalc^s direct Riemann-Siegel formula; see Table 1, Section 3. 

However, if one is interested in finding a smaller set of zeros, then our method 
might be more suitable, both in terms of timings and the required programming 
effort. This is because one does not expect the time requirement of the Odlyzko- 
Schonhage algorithm to decrease significantly as the size of the set of zeros to be 
located decreases, whereas the time requirement of our method becomes substan- 
tially less. The reason is near t = 10 16 , and with 10 9 zeros to be located, we are 
working in the region a ps 1/2 in corollarv ll.31 So the running time of our method 
is roughly linear in the number of zeros to be found. For example, to locate 2 x 10 8 
zeros to within ±10~ 8 near t — 10 16 , we expect our method to consume 10 times 
less than in the previous scenario, or pa 2, 400 hours. 

We remark our implementation of the method will be available in the next release 
of Michael Rubinstein's Icalc; see [Rlj . 

2. Proof of Theorem 11.11 

In view of formula ()1.2ll . and the remarks made thereafter, in order to compute 
Z(t) with polynomial accuracy, it suffices to numerically evaluate the main sum 



(2.1) 




n=i 
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with polynomial accuracy. Given T > 100, and fixed numbers a € [0, 1/2], k > 0, 
we wish to evaluate F(t) for many values of t E [T,T + T a ] to within ±T~ K . Let 
ri2 := [y/T/ (2ir)\ . Since a 6 [0, 1/2], then rii, which is the length of the main sum 
(I2.1j) , is equal to either n 2 or n 2 + 1 . Without loss of generality, we may assume the 
length of the main sum is n 2 . Let M := min{[T Q ] , n 2 }. Note M and n 2 depend 
only on T. Then, there exist unique integers Mi,m > (also depending only on 
T), with n 2 = 2 m M + Mi, and Mi < 2 m M. So the main sum can be written as 



E 



a it log n 



E 



= it log n 



jn '■ — ' \/n 

Kn<M v M<n<2M v 



(2.2) 



Initial sum 



+ E 



it logn 



+ 



E 



At log n 



2 m - 1 M<n<2 m M 2™M<n<2™M+M l V n 



Tail sum 



The "initial sum" in (|2~2J) can be evaluated directly to within ±r _K_1 in 10M < 
10T a operations on numbers of [(10k + 10) log T] bits, say. Thus, we may focus 
our attention on computing (to within T~ K_1 ) the subsums: 



2 1+1 M-1 



(2.3) V /G{0,l,...,m-1}, 

n=2 l M v 



and 



2 m M+M 1 e i t \ 0&n 

E .AT 



n=2 m M 



Tail sum 



A direct evaluation of all of the subsums (|2.3p requires T ,1 / 2+0 * ] ( 1 ^ operations. 
But computing the individual terms in these subsums involves many common steps 
for a lot of different choices of t € [T,T + T a ] . We take advantage of this to obtain 
a substantially lower running times. So consider one of the subsums on the left side 
of (|2.3|) . Split it into consecutive blocks of length 2 l . This gives, 



(2.4) 



2 !+1 M-l i4 w„ a'M+a'-l itlo „„ 2 ! M+2(2 ! )-l - tl 



E 



E 



=2 ! M 



=2 ! M 



E 

n=2 ! Af+2' 



+ 



First block 
2'M+3(2')-l eltlog 

n=2'M+2(2 ! ) v 



Third block 



Second block 



2 1 M+M2'-1 iUog „ 

V 



n=2 l M+(M-l)2 l 



M th block 



Notice for any I G {0, 1, . . . , m — 1}, we have a total of 2 ( M /2 l = M blocks. So the 
total number of blocks for any such I is exactly M. As for the "Tail sum" in (|2.3I) . 
we use blocks of length 2 m . This gives a similar outcome as in (|2.4I) . except now 
the total number of blocks is M' = [(-^l + l)2~ m J , which is at most M, and there 
is possibly a "Remainder block" of length < 2 m , 
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2 m M+M 1 gitlogn 

(2.5) Hu, Ml ,m{t):= V — where M' := 



n=2 m M+M'2" 



Mi + 1 



< M. 



Remainder block 

Other than the remainder block TZM l M 1 . m (t), all our blocks are members of the set 

2'M+(r+i)2«-i Hlogn 

(2.6) V le{0,l,...,m}, re{0,l Jf-1}. 

n=2'M+r2' 

(when I = to, only the values r < M' are relevant to the algorithm). Let 

(2.7) v := v Wr = 2 l M + r2 l , K := K t =2 l . 

Then the v th block (the one starting at v := v^ r ) can be written in the form 

K ~ 1 p it\og{v+k) K ~ 1 it log(l+fe/ti) 

(2.8) V - e * tl0 ^ V =: e itlos *i^(i) . 

The function F Vt K{t) is a band-limited function; that is, its spectrum is limited 
to a finite interval, which in this case is the interval [0,log(l + K/v)]. In more 
technical terms, a band-limited function can be defined as a function whose Fourier 
transform is a tempered distribution with compact support. There exists a clever 
method to compute band-limited functions, which we reproduce in the Appendix 
with slight modifications (see [D], p. 88, for an in-depth discussion and history of 
the method). To apply band-limited interpolation, first note by construction, 

, ^ K K, 2 l 1 

V ' ' v v hr 2 l M + r2 l ~ M ' 

for all I g {0, 1, . . . , to}, and r g {0, 1, . . . , M — 1}. Therefore, the spectrum of the 
functions F v .x(t) is always contained in the interval [0, M ]. As for the remainder 
block K M ,Mum{t), let 

(2.10) v' := 2 m M + M'2 m , K' := Ml - M'2 m , (note K' < 2 m ), 
then write 

v'+K' itlogn K ' itlog(l+k/v') 

(2-H) f« ^ to ^Tk 

=:e itlosv ' F v , iK ,(t). 
Then similarly to the bound (|2.9p . we have 

K' Mi - M'2 m 2 m 1 

(2.12) — = < < — . 

v ' v' 2 m M + M'2 m ~ 2 m M + M'2 m ~ M 
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In particular, the spectrum of F v i t K>(t) is also contained in [0, M ]. So, we may 
apply formulas (|4.3|) and (|4.4|l in the Appendix with G(t) = F Vl ic(t), and G(t) = 
Fv',K'(t), and (in both cases), 



(2.13) t = M-\ /? = 3t, A = (/3 + t)/2 = 2t , ex = (/3 - r)/2 = r , 
say. We then appeal to the bounds (|4.5[) and (|4.6[) in the Appendix to obtain 



A /mr\ sin X(nir/B — t) , . 

for any c > 1, where 



smh(c) ^c 2 - £ 2 U 2 ' 1 1 ^ v / tT+fe ' 

By similar calculations, we obtain an analogous formula to (|2.14l) for F v i : K'{t), 
which corresponds to the remainder block lZM,M u m(t)- Now, define 

A v-^ / n7r \ sin X(nn/8 — t) , 

(2.16) F vM t) = ~ E (-) AK -t) ft( ^ - <} ' 

P |TMr//J-t|<c/£i V ' 7 V /P ; 

Also define F v ^K>(t) in an analogous way to (|2.16[) . Note F v> k (also, F v > y K'(t)) is 
a sum of < 2cj3/e\ < 6c terms. Then put together, we have 



it log n m-lAf-1 



l<n<Vl7(2^) l<n<M V ^ i=0 r=0 



sum 



(2.17) 77~^ ' Initial ; 

Main sum 

M'-l 

+ E e ltlos ^- F Vm ^ Km {t) + P v ,, K ,(t)+£ . 



r=0 

where, 

m-lM-l M'-l 

(2.18) |f | < E E l £ "«,r,*«l + E l f ««.r,K,l + \£«,K>\ < 20e- c VT 

1=0 r=Q r=Q 

We choose c = (k+2) logT, so that \£\ < T~ K ~ . Lastly, as we will soon explain, the 
right side of (|2.17j) can now be evaluated to within ±T~ K using 

T q+o k (i) p era tions 

provided we precompute the entries of the following tables (to within ±(6c)~ 1 T~ K ~ 1 
each) : 
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F VhT>Kl f — J :l€ [0,m-l],rG [0,M-1], — e [T - 2c/e 1 , T + T a + 2c/e 1 ] 
Fv m ,„,K m (jf) : r e [0, M' - 1] , ^ G [T - 2c/e 1 ,T + T a + 2c/e{ 

Fv '- K ' (t ) : T e [T _ 2c/ei ■ T + T " + 2c/ei] 

(2.19) 

To compute the entries in the first table to within ±(6c)~ 1 T~ K ~ 1 requires a number 
of operations of at most 

m-l M-l 



J2 E rio/3(r Q +4 C / ei )i^ 



(2.20) r= ° 

v y m-l M-l 



< E E !0(10+ 12c)2' < 150c 2 m M < 200cT 1/2 , 



;=0 r=0 



where the operations are carried out on numbers of [(10ft+10) logT] bits, say. Note 
the second inequality in (|2.20p used the fact /3/ei = 3, which is true by construction. 
Similarly, the second and third tables in (|2.19[) require at most 200c 2 m M' and 
200c 2 m such operations, respectively. Since M' < M, and 2 m M < T 1 / 2 , then the 
total cost of precomputing the entries of all three tables in (|2.19p is less than 

(2.21) 600cT 1/2 < 600(K + 2)T 1/2 logT 

operations. Once the precomputation is done, the right side of (|2.17p can be com- 
puted, with the aid of formula (I2.16p . to within ±T~ K ~ 1 in less than 



20M + 20mM[2/3c/ei] + 2QM'\2j3c/e{\ +20[2^c/ei] 

< 1000(K + 2)T Q (logT) 2 



operations. Finally, the storage space requirement for precomputing the three tables 
in (|2.19p is at most 

(2.23) 3mM[2/3c/eil \{k + l)logT] < 1000(k + 2) 2 T"(logT) 3 

bits. 

3. Comparison with the Riemann-Siegel formula 

In this section, our algorithm is denoted by BLFI (band-limited function inter- 
polation), and the Riemann-Siegel formula is denoted by RS. 

The performance of the BLFI algorithm is compared with that of a relatively 
optimized version of the RS formula included in the Icalc library. The Icalc library is 
a C++ software developed by Michael Rubinstein to compute values of L- functions, 
including the zeta function. It can be downloaded at [Rlj . 
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The BLFI algorithm was also coded in C++, essentially as described in the 
previous section^j. We used double-precision arithmetic, which allows a maximum 
of 16 digits of accuracj0. 

We used the BLFI algorithm and the RS formula to evaluate C(l/2 + it) on a 
grid of points of the form 

(3.1) [T ,T + nA] , n=l,...,N. 

Here, A > is the spacing (or density) of pointtQ, and N is the number of points in 
the grid. In case of the BLFI algorithm, an upper bound for the truncation error £ 
was also chosen. The output of the precomputation needed by the BLFI algorithm 
is stored in dynamic memory, not saved in files. 

The running times of the BLFI algorithm, presented in tables below, account for 
everything, including the precomputation. Also, when T > 10 10 , timings for the 
RS formula were obtained by evaluating zeta at a number of points M < N, then 
multiplying the running time by N/M. 

In the tables to follow, the running times are formatted as x m y s , where x and 
y are the numbers of minutes and seconds, respectively, consumed by the method 
under test. Finally, our tests were carried out on a personal Mac machine. The 
same compiler options were used for both programs. 

Tables [2] and [3] list running times for BLFI and RS at various heights. The last 
column is the ratio of the time consumed by RS to the time consumed by the BLFI 
algorithm. At each height, we used a grid of 10 5 equidistant points. In Table [21 the 
spacing of the grid points is A = 0.01, and in Table[3]it is A = 0.1. The truncation 
error was chosen to satisfy £ < 10~ 8 in both tables. 

TABLE 2. Running times of BLFI and RS with N = 10 5 , A = 0.01, and £ < 10" 8 . 



T 


RS 


BLFI 


Ratio 


10 8 


0m 18s 


0m 7s 


2 


10 10 


2m 56s 


0m 12s 


14 


10 12 


29m 10s 


0m 35s 


50 


10 14 


348m 0s 


2m 35s 


134 


10 16 


3700m 0s 


8m 28s 


437 



Tables [5] and [3] indicate the running time of RS grows like T 1 / 2 , as expected, while 
the running time of the BLFI algorithm grows roughly like T 1 ' 4 , also as expected. 
As T increases, the savings achieved by the BLFI algorithm are accentuated. For 
example, when T = 10 16 , and A = 0.1, the BLFI algorithm is more than 200 times 
faster than RS, but it is only 2 times faster when T = 10 8 . 



Our implementation of the BLFI algorithm differed from the description in the previous 
sections only in that we centered the band-limited functions F v K so their spectrum is in 
[-0.5 log(l + K/v), 0.5 log(l + K/v)] rather than [0, log(l + K/v)]. This allows the recovery 
of values of the functions F v k using less frequent sampling; see [0"|. pp. 90-91. 

2 Working with 30-digit arithmetic takes about 10 times longer for both RS and BLFI. But the 
slowdown can be made much less severe by following some of the tricks in [G] p. 13 

^The grid of points where the zeta function is to be evaluated need not at all be uniform. What 
matters is the average spacing of the grid points. 
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TABLE 3. Running times of BLFI and RS with N = 10 5 , A = 0.1, and £ < 10 -8 . 



T 


RS 


BLFI 


Ratio 


10* 


0m 18s 


Om 9s 


2 


10 10 


2m 56s 


Om 27s 


6 


10 12 


29m 10s 


lm 34s 


18 


10 14 


348m Os 


5m 50s 


60 


10 16 


3700m Os 


18m 45s 


205 



We measure the sensitivity of the running time of the BLFI algorithm to per- 
turbations in the values of its parameters and input. Table |4] indicates the running 
time of the BLFI algorithm grows roughly linearly with the number of grid points 
N, as soon as N is large enough, which is expected. Table[5]shows the running time 
is not radically affected by changes in the error allowance £. This is not surprising 
either, because demanding higher precision increases the number of terms in the 
BLFI formula (|2.14|) only logarithmically. 

TABLE 4. Running times of BLFI with T = 10 12 , A = 0.1, and £ < 10~ 8 . 



N 


BLFI 


10 3 


0m 2s 


10 4 


0m 10s 


10 5 


lm 34s 


10 6 


15m 27s 



TABLE 5. Running times of BLFI with T = 10 12 , A = 0.1, and N = 10 5 . 



£ 


BLFI 


10~ 


6 


lm 28s 


10" 


8 


lm 34s 


io- 


10 


lm 41s 


io- 


12 


lm 48s 


io- 


14 


lm 53s 



Lastly, Table [S] lists timings for the BLFI Algorithm for various values of A. As 
A increases, the grid expands, so more effort is exerted during the precomputation. 
This explains the slowdown in the method as A increases. 

4. Appendix: band-limited interpolation 

Much of the material in this section is contained in [O] . pages 88-93. For the 
convenience of the reader, it is reproduced here with slight modifications. Let 



(4.1) 



G(t) = J g{x)e ixt dx , 
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TABLE 6. Running times of BLFI with T = 10 12 , N = 10 5 , and £ < 10" 8 . 



A 


BLFI 


0.01 


0m 35s 


0.05 


lm 10s 


0.1 


lm 34s 


0.2 


2m 8s 


0.4 


2m 54s 



be a band- limited function, where g{x) is a (finite) linear combination of delta 
functions supported on (— r, r). It is well-known G can be recovered completely 
from its values at the grid points {titt/t | n £ Z}. But this recovery process is not 
efficient, because it requires the values of G at many sample points titt/t. 

The idea of a band-limited interpolation technique is to sample G on a denser 
grid, say {mr/(3\n £ Z}, where (3 > r, after which G(t) can be recovered much 
more quickly, from its values at only a few grid points nn//3 that are close to t. 

Specifically, choose (3 > r, and define A := (/3+r)/2, e\ :— (/?— r)/2. Let J denote 
the characteristic function of the interval [—A, A], and let H be any continuous 
function with total mass 1 supported on [— ei, ei]. Define / := I*H, the convolution 
of / and H, and let / and H denote the Fourier transforms: 



/OO P oo 

f(x)e~ ltx dx, H{t):= H{x)e7 ltx dx . 
-oo J — oo 

By construction, / = I * H is supported on [— /3, /3], and it is identically 1 on 
[— r, t]. And by hypothesis, g(x) is supported on [— r, r]. Therefore, f(x)g(x) 

■ the only difference is the left side involve smoothing in the "redundant" inter- 
val [r, ft] U [— /?, — t] . The latter observation is what gives the band-limited technique 
its edge. This is because some Fourier analysis yields, 



f°° A 
G(t) = I f(x)g(x)e M = - £ G(mr//3) /(t - nTr/0) 



— OO 

OO 



= J E G(rm/j3) _ H(t-nrr/P), 

n— — oo x ' ' 

In particular, we can try to choose the smoothing function if so as to accelerate 
the convergence of the right side in (j4.3[) . There are many choices for if (e.g. a 
Gaussian). Following Odlyzko [D], we choose the following function, which still 
closely resembles a Gaussian, but leads to smaller big-O constants (see [L] and [O] . 
P-92): 



(4.4) • ^.f ^. 

smh(c) ^/ c 2 _ e 2 u 2 

where c > 1 is any fixed number. Finally, we only sum the terms with \nir/ /3 — 1\ < 
c/ei in formula f|4. 3[) . which yields a truncation error £i satisfying 
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(4.5) |£i|<2sup|G 
This is bounded by (see [Q], p. 92, and [L]) 



/ 


H(u) 


' \u\>c/ ei 


u 



du , 



(4.6) |^| <2sup|G(t)|log^_!<6sup|G(i)|e- 

teR 1 — e teR 
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